Overview

Dataset statistics

Number of variables12
Number of observations3187
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory298.9 KiB
Average record size in memory96.0 B

Variable types

NUM9
CAT3

Reproduction

Analysis started2020-07-12 14:14:43.037704
Analysis finished2020-07-12 14:14:57.626196
Duration14.59 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

loc.details has a high cardinality: 290 distinct values High cardinality
location has a high cardinality: 1378 distinct values High cardinality
deposit_amount_2012 is highly correlated with deposit_amount_2011 and 3 other fieldsHigh correlation
deposit_amount_2011 is highly correlated with deposit_amount_2012 and 2 other fieldsHigh correlation
deposit_amount_2013 is highly correlated with deposit_amount_2011 and 5 other fieldsHigh correlation
deposit_amount_2014 is highly correlated with deposit_amount_2011 and 5 other fieldsHigh correlation
deposit_amount_2015 is highly correlated with deposit_amount_2012 and 4 other fieldsHigh correlation
deposit_amount_2016 is highly correlated with deposit_amount_2013 and 3 other fieldsHigh correlation
deposit_amount_2017 is highly correlated with deposit_amount_2013 and 3 other fieldsHigh correlation
id has unique values Unique

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct count3187
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1934.8249137119549
Minimum1
Maximum3772
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum1
5-th percentile188.3
Q1935.5
median2059
Q32879.5
95-th percentile3558.7
Maximum3772
Range3771
Interquartile range (IQR)1944

Descriptive statistics

Standard deviation1095.640799
Coefficient of variation (CV)0.5662738736
Kurtosis-1.244757168
Mean1934.824914
Median Absolute Deviation (MAD)967
Skewness-0.1229164455
Sum6166287
Variance1200428.76
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
5971< 0.1%
 
5931< 0.1%
 
26401< 0.1%
 
5891< 0.1%
 
26361< 0.1%
 
5851< 0.1%
 
26321< 0.1%
 
5811< 0.1%
 
26281< 0.1%
 
Other values (3177)317799.7%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
ValueCountFrequency (%) 
37721< 0.1%
 
37681< 0.1%
 
37671< 0.1%
 
37661< 0.1%
 
37651< 0.1%
 

deposit_amount_2011
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count2950
Unique (%)92.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45305.863131471604
Minimum156.0
Maximum159399.0
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum156
5-th percentile3671.7
Q114126.85
median36636
Q367104.45
95-th percentile119540.85
Maximum159399
Range159243
Interquartile range (IQR)52977.6

Descriptive statistics

Standard deviation36504.93071
Coefficient of variation (CV)0.8057440735
Kurtosis-0.07808347874
Mean45305.86313
Median Absolute Deviation (MAD)25084.5
Skewness0.8627623671
Sum144389785.8
Variance1332609966
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2973.6140.4%
 
3786100.3%
 
4476.990.3%
 
3930.680.3%
 
3595.880.3%
 
2461.870.2%
 
3841.550.2%
 
4579.850.2%
 
3257.750.2%
 
3111.940.1%
 
Other values (2940)311297.6%
 
ValueCountFrequency (%) 
1561< 0.1%
 
172.51< 0.1%
 
274.51< 0.1%
 
562.51< 0.1%
 
574.51< 0.1%
 
ValueCountFrequency (%) 
1593991< 0.1%
 
158926.51< 0.1%
 
157300.51< 0.1%
 
156943.51< 0.1%
 
156646.51< 0.1%
 

deposit_amount_2012
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count2988
Unique (%)93.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48864.11700658927
Minimum117.0
Maximum156289.5
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum117
5-th percentile3455.79
Q117355.75
median41398.5
Q371778
95-th percentile123168.3
Maximum156289.5
Range156172.5
Interquartile range (IQR)54422.25

Descriptive statistics

Standard deviation37318.30671
Coefficient of variation (CV)0.763715974
Kurtosis-0.2678550215
Mean48864.11701
Median Absolute Deviation (MAD)26448
Skewness0.7433380045
Sum155729940.9
Variance1392656016
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3420.3200.6%
 
2315.1100.3%
 
10397.180.3%
 
5062.880.3%
 
3845.160.2%
 
4722.350.2%
 
3442.550.2%
 
261350.2%
 
609950.2%
 
512740.1%
 
Other values (2978)311197.6%
 
ValueCountFrequency (%) 
1171< 0.1%
 
1801< 0.1%
 
2131< 0.1%
 
214.51< 0.1%
 
298.51< 0.1%
 
ValueCountFrequency (%) 
156289.51< 0.1%
 
1547401< 0.1%
 
1546951< 0.1%
 
154348.51< 0.1%
 
1533241< 0.1%
 

deposit_amount_2013
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3064
Unique (%)96.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56306.26501411986
Minimum82.5
Maximum192361.5
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum82.5
5-th percentile5382.72
Q124257.25
median48889.5
Q381142.5
95-th percentile133243.05
Maximum192361.5
Range192279
Interquartile range (IQR)56885.25

Descriptive statistics

Standard deviation39592.22015
Coefficient of variation (CV)0.7031583455
Kurtosis-0.3370600761
Mean56306.26501
Median Absolute Deviation (MAD)27676.5
Skewness0.6836752758
Sum179448066.6
Variance1567543896
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3962.1100.3%
 
1543.280.3%
 
3132.370.2%
 
8201.460.2%
 
6541.260.2%
 
314750.2%
 
378640.1%
 
5737.240.1%
 
2328.940.1%
 
7769.430.1%
 
Other values (3054)313098.2%
 
ValueCountFrequency (%) 
82.51< 0.1%
 
142.51< 0.1%
 
1561< 0.1%
 
277.51< 0.1%
 
424.51< 0.1%
 
ValueCountFrequency (%) 
192361.51< 0.1%
 
1787461< 0.1%
 
165862.51< 0.1%
 
164914.51< 0.1%
 
1631011< 0.1%
 

deposit_amount_2014
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3097
Unique (%)97.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63760.04581110762
Minimum108.0
Maximum192744.0
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum108
5-th percentile8998.74
Q131295.25
median55180.5
Q390468.75
95-th percentile143629.05
Maximum192744
Range192636
Interquartile range (IQR)59173.5

Descriptive statistics

Standard deviation41509.51272
Coefficient of variation (CV)0.6510270215
Kurtosis-0.3842805608
Mean63760.04581
Median Absolute Deviation (MAD)28107
Skewness0.6530100331
Sum203203266
Variance1723039646
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2909.760.2%
 
2368.850.2%
 
8108.740.1%
 
718830.1%
 
169530.1%
 
5191230.1%
 
2277.330.1%
 
5241330.1%
 
77794.520.1%
 
4593920.1%
 
Other values (3087)315398.9%
 
ValueCountFrequency (%) 
1081< 0.1%
 
274.51< 0.1%
 
364.51< 0.1%
 
487.51< 0.1%
 
5221< 0.1%
 
ValueCountFrequency (%) 
1927441< 0.1%
 
183061.51< 0.1%
 
1791481< 0.1%
 
1773331< 0.1%
 
1770361< 0.1%
 

deposit_amount_2015
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3142
Unique (%)98.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72616.43109507374
Minimum1218.0
Maximum231750.0
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum1218
5-th percentile14710.65
Q137819.5
median63582
Q3101347.5
95-th percentile156677.1
Maximum231750
Range230532
Interquartile range (IQR)63528

Descriptive statistics

Standard deviation43995.33058
Coefficient of variation (CV)0.6058591688
Kurtosis-0.3119043579
Mean72616.4311
Median Absolute Deviation (MAD)29440.5
Skewness0.6683973673
Sum231428565.9
Variance1935589113
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5850.650.2%
 
850520.1%
 
4590320.1%
 
6199220.1%
 
25114.520.1%
 
102274.520.1%
 
111298.520.1%
 
2189420.1%
 
14211920.1%
 
7242920.1%
 
Other values (3132)316499.3%
 
ValueCountFrequency (%) 
12181< 0.1%
 
1369.51< 0.1%
 
1504.51< 0.1%
 
15751< 0.1%
 
17041< 0.1%
 
ValueCountFrequency (%) 
2317501< 0.1%
 
222052.51< 0.1%
 
2015101< 0.1%
 
2012251< 0.1%
 
199099.51< 0.1%
 

deposit_amount_2016
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3140
Unique (%)98.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82683.86037025416
Minimum6502.5
Maximum268407.0
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum6502.5
5-th percentile20894.4
Q145667.5
median72400.5
Q3112866
95-th percentile175002.6
Maximum268407
Range261904.5
Interquartile range (IQR)67198.5

Descriptive statistics

Standard deviation47620.2991
Coefficient of variation (CV)0.5759322181
Kurtosis-0.1483576894
Mean82683.86037
Median Absolute Deviation (MAD)31339.5
Skewness0.7310038941
Sum263513463
Variance2267692887
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
46591.520.1%
 
75835.520.1%
 
77362.520.1%
 
13991120.1%
 
80713.520.1%
 
33325.520.1%
 
86626.520.1%
 
72898.520.1%
 
110755.520.1%
 
6661220.1%
 
Other values (3130)316799.4%
 
ValueCountFrequency (%) 
6502.51< 0.1%
 
69901< 0.1%
 
7714.51< 0.1%
 
81091< 0.1%
 
81331< 0.1%
 
ValueCountFrequency (%) 
2684071< 0.1%
 
2494861< 0.1%
 
2455351< 0.1%
 
239257.51< 0.1%
 
234763.51< 0.1%
 

loc.details
Categorical

HIGH CARDINALITY

Distinct count290
Unique (%)9.1%
Missing0
Missing (%)0.0%
Memory size24.9 KiB
Los Angeles
 
185
Cook
 
139
Orange
 
102
Harris
 
93
Maricopa
 
88
Other values (285)
2580
ValueCountFrequency (%) 
Los Angeles1855.8%
 
Cook1394.4%
 
Orange1023.2%
 
Harris932.9%
 
Maricopa882.8%
 
King732.3%
 
San Diego712.2%
 
Clark692.2%
 
Miami-Dade611.9%
 
Marion561.8%
 
Other values (280)225070.6%
 

Length

Max length16
Median length7
Mean length7.37307813
Min length3

location
Categorical

HIGH CARDINALITY

Distinct count1378
Unique (%)43.2%
Missing0
Missing (%)0.0%
Memory size24.9 KiB
Chicago
 
85
Houston
 
64
Indianapolis
 
44
New York City
 
43
Los Angeles
 
40
Other values (1373)
2911
ValueCountFrequency (%) 
Chicago852.7%
 
Houston642.0%
 
Indianapolis441.4%
 
New York City431.3%
 
Los Angeles401.3%
 
Las Vegas331.0%
 
Miami331.0%
 
San Antonio290.9%
 
Seattle290.9%
 
San Francisco260.8%
 
Other values (1368)276186.6%
 

Length

Max length22
Median length9
Mean length9.068716661
Min length4

state
Categorical

Distinct count22
Unique (%)0.7%
Missing0
Missing (%)0.0%
Memory size24.9 KiB
CA
719
TX
341
NY
337
FL
337
IL
 
219
Other values (17)
1234
ValueCountFrequency (%) 
CA71922.6%
 
TX34110.7%
 
NY33710.6%
 
FL33710.6%
 
IL2196.9%
 
WA1906.0%
 
NJ1675.2%
 
IN1585.0%
 
OR1113.5%
 
AZ1103.5%
 
Other values (12)49815.6%
 

Length

Max length2
Median length2
Mean length2
Min length2

deposit_amount_2017
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3140
Unique (%)98.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean124025.79055538123
Minimum9753.75
Maximum402610.5
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum9753.75
5-th percentile31341.6
Q168501.25
median108600.75
Q3169299
95-th percentile262503.9
Maximum402610.5
Range392856.75
Interquartile range (IQR)100797.75

Descriptive statistics

Standard deviation71430.44865
Coefficient of variation (CV)0.5759322181
Kurtosis-0.1483576894
Mean124025.7906
Median Absolute Deviation (MAD)47009.25
Skewness0.7310038941
Sum395270194.5
Variance5102308995
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
74751.7520.1%
 
95379.7520.1%
 
49988.2520.1%
 
69887.2520.1%
 
6632120.1%
 
100030.520.1%
 
53579.2520.1%
 
82383.7520.1%
 
151852.520.1%
 
87833.2520.1%
 
Other values (3130)316799.4%
 
ValueCountFrequency (%) 
9753.751< 0.1%
 
104851< 0.1%
 
11571.751< 0.1%
 
12163.51< 0.1%
 
12199.51< 0.1%
 
ValueCountFrequency (%) 
402610.51< 0.1%
 
3742291< 0.1%
 
368302.51< 0.1%
 
358886.251< 0.1%
 
352145.251< 0.1%
 

age_of_bank
Real number (ℝ≥0)

Distinct count111
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.37088170693443
Minimum1
Maximum191
Zeros0
Zeros (%)0.0%
Memory size24.9 KiB

Quantile statistics

Minimum1
5-th percentile5
Q124
median97
Q397
95-th percentile97
Maximum191
Range190
Interquartile range (IQR)73

Descriptive statistics

Standard deviation38.98606483
Coefficient of variation (CV)0.5462460866
Kurtosis-1.028681742
Mean71.37088171
Median Absolute Deviation (MAD)0
Skewness-0.7697950624
Sum227459
Variance1519.913251
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
97186158.4%
 
5932.9%
 
13652.0%
 
12632.0%
 
8601.9%
 
6561.8%
 
82551.7%
 
4531.7%
 
14501.6%
 
9491.5%
 
Other values (101)78224.5%
 
ValueCountFrequency (%) 
11< 0.1%
 
2280.9%
 
3411.3%
 
4531.7%
 
5932.9%
 
ValueCountFrequency (%) 
1911< 0.1%
 
1671< 0.1%
 
1631< 0.1%
 
16220.1%
 
16120.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

iddeposit_amount_2011deposit_amount_2012deposit_amount_2013deposit_amount_2014deposit_amount_2015deposit_amount_2016loc.detailslocationstatedeposit_amount_2017age_of_bank
0132079.035971.537237.540362.046021.546020.0WaukeshaWalesWI69030.00106
1283181.084846.097098.0110284.5122035.5133905.0WashingtonGermantownWI200857.5097
2468511.073932.079876.5105603.0112113.0110755.5WaukeshaPewaukeeWI166133.2597
3596271.5108325.5104880.0121054.5113956.5109837.5WaukeshaWaukeshaWI164756.2597
4693837.0101592.0118270.5140280.0150987.0168742.5WaukeshaWaukeshaWI253113.7597
58126933.0144072.0155919.0164754.0181075.5184749.0WaukeshaNew BerlinWI277123.5056
6972700.573044.082053.085413.083767.587390.0WaukeshaOconomowocWI131085.0084
71073921.573033.573011.078331.580385.083619.0WaukeshaButlerWI125428.5097
81146113.047869.549678.562046.068752.582890.0WaukeshaMuskegoWI124335.0097
91244221.546537.552206.060166.563582.070984.5WaukeshaWaukeshaWI106476.7533

Last rows

iddeposit_amount_2011deposit_amount_2012deposit_amount_2013deposit_amount_2014deposit_amount_2015deposit_amount_2016loc.detailslocationstatedeposit_amount_2017age_of_bank
317737514476.95062.85737.27188.05850.69285.0CobbAtlantaGA13927.5097
317837525608.87643.79831.09504.010885.214848.5SomersetBernardsvilleNJ22272.7597
317937563629.74454.16974.77107.37743.911262.0SumterThe VillagesFL16893.001
318037634476.95062.85737.27188.05850.69238.5Palm BeachPalm Beach GardensFL13857.7597
318137644476.95062.85737.27188.05850.69229.5DuvalJacksonvilleFL13844.252
318237653113.15125.57568.78907.39931.512067.5CookGlencoeIL18101.252
318337663860.15302.87048.58007.09378.912453.0WestchesterBedford HillsNY18679.502
318437674476.95062.84995.05373.65524.59711.0SarasotaSarasotaFL14566.5097
318537684476.95062.83879.34504.54047.08109.0San MateoSouth San FranciscoCA12163.5097
318637725549.17395.07934.48913.09235.515699.0ManateeBradentonFL23548.5097